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Abstract 

n source and destination pairs randomly located in an area want to communicate with each other. 
Signals transmitted from one user to another at distance r apart are subject to a power loss of r^" 
as well as a random phase. We identify the scaling laws of the information theoretic capacity of the 
network. In the case of dense networks, where the area is fixed and the density of nodes increasing, we 
show that the total capacity of the network scales linearly with n. This improves on the best known 
achievability result of n^/'^ of [1]. In the case of extended networks, where the density of nodes is fixed 
and the area increasing linearly with n, we show that this capacity scales as n2-Q:/2 ^^j. 2 < a < 3 and 
^/n for a > 3. The best known earlier result [2] identified the scaling law for a > 4. Thus, much better 
scaling than multihop can be achieved in dense networks, as well as in extended networks with low 
attenuation. The performance gain is achieved by intelligent node cooperation and distributed MIMO 
communication. The key ingredient is a hierarchical and digital architecture for nodal exchange of 
information for realizing the cooperation. 

L INTRODUCTION 

The seminal paper by Gupta and Kumar [3] initiated the study of scaling laws in large ad- 
hoc wireless networks. Their by-now-familiar model considers n nodes randomly located in 
the unit disk, each of which wants to communicate to a random destination node at a rate 
R{n) bits/second. They ask what is the maximally achievable scaling of the total throughput 
T(n) = n R{n) with the system size n. They showed that classical multihop architectures with 
conventional single-user decoding and forwarding of packets cannot achieve a scaling better 
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than 0{^/n), and that a scheme that uses only nearest- neighbor communication can achieve a 
throughput that scales as ©(-^n/ logn). This gap was later closed by Franceschetti et al [4], 
who showed using percolation theory that the Q{^/n) scaling is indeed achievable. 

The Gupta-Kumar model makes certain assumptions on the physical-layer communication 
technology. In particular, it assumes that the signals received from nodes other than one particular 
transmitter are interference to be regarded as noise degrading the communication link. Given 
this assumption, direct communication between source and destination pairs is not preferable, as 
the interference generated would preclude most of the other nodes from communicating. Instead, 
the optimal strategy is to confine to nearest neighbor communication and maximize the number 
of simultaneous transmissions (spatial reuse). However, this means that each packet has to be 
retransmitted many times before getting to the final destination, leading to a sub-linear scaling 
of system throughput. 

A natural question is whether the Gupta-Kumar scaling law is a consequence of the physical- 
layer technology or whether one can do better using more sophisticated physical-layer processing. 
More generally, what is the information-theoretic scaling law of ad hoc networks? This question 
was first addressed by Xie and Kumar [5]. They showed that whenever the power path loss 
exponent a of the environment is greater than 6 (i.e. the received power decays faster than 
with the distance r from the transmitter), then the nearest-neighbor multihop scheme is in 
fact order-optimal. They also showed that the same conclusion holds if the power path loss is 
exponential in the distance r, a channel model proposed recently by Franceschetti et al [6]. The 
work [5] was followed by several others [7], [8], [9], [2], [10]. Successively, they improved the 
threshold on the path loss exponent a for which multihop is order-optimal (a > 5 in [7], a > 4.5 
in [10] and a > 4 in [2]). However, the question is open for the important range of a between 
2 and A, a — 2 corresponding to free space attenuation. 

There is an important difference between the network model used in [3] and that used in 
[5] and the follow-up works. The paper [3] deals with dense networks, where the total area is 
fixed and the density of nodes increases. The paper [5] and the subsequent works, on the other 
hand, focus on extended networks, which scale to cover an increasing area with the density of 
nodes fixed. A way to understand the difference between the engineering implications of these 
two network scalings is by drawing a parallelism with the classical notions of interference- 
limitedness and coverage-limitedness, the two operating regimes of cellular networks. Cellular 
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networks in urban areas tend to have dense deployments of base-stations so that signals are 
received at the mobiles with sufficient signal-to-noise ratio (SNR) but performance is limited 
by interference between transmissions in adjacent cells. Cellular networks in rural areas, on 
the other hand, tend to have sparse deployments of base-stations so that performance is mainly 
limited by the ability to transmit enough power to reach all the users with sufficient signal-to- 
noise ratio. Analogously, in the dense network scaling, all nodes can communicate with each 
other with sufficient SNR; performance can only be limited by interference, if at all. The Gupta- 
Kumar scaling of ^Jn comes precisely from such interference limitation. In the extended network 
scaling, the source and destination pairs are at increasing distance from each other, and so both 
interference limitation and power limitation can come into play. The network can be either 
coverage-limited or interference-limited. The information-theoretic limit on performance proved 
in [5], [7], [8], [9], [2], [10] are all based on bounding the maximum amount of power that can be 
transferred across the network and then showing that multihop achieves that bound. Hence, what 
was shown by these works is that for a > 4, when signals attenuate fast enough, the extended 
network is fundamentally coverage-limited: even with optimal cooperative relaying, the amount 
of power transferred across the network cannot be larger than that achieved by multihop. For 
a between 2 and 4, when attenuation is lower and power transfer become easier, the question 
remains open whether the network is coverage-limited or interference-limited. 

Viewing the earlier results in this light, a natural first step in completing the picture is to return 
to the simpler dense network as a vehicle to focus exclusively on the issue of interference. Can the 
interference limitation implied by the Gupta-Kumar result be overcome by more sophisticated 
physical-layer processing? In a recent work [1], Aeron and Saligrama have showed that the 
answer is indeed yes: they exhibited a scheme which yields a throughput scaling of ©(n^/^) 
bits/second. However, it is not clear if one can do even better. The first main result in this paper 
is that, for any value of a > 2, one can in fact achieve arbitrarily close to linear scaling: for 
any e > 0, we present a scheme that achieves an aggregate rate of 0(n^^''). This is a surprising 
result: a linear scaling means that there is essentially no interference-limitation; the rate for each 
source-destination pair does not degrade significantly even as one puts more and more nodes in 
the network. It is easy to show that one cannot get a better capacity scaling than O(nlogn), so 
our scheme is close to optimal. 

To achieve linear scaling, one must be able to perform many simultaneous long-range commu- 
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nications. A physical-layer technique which achieves this is MIMO (multi-input multi-output): 
the use of multiple transmit and receive antennas to multiplex several streams of data and transmit 
them simultaneously. MIMO was originally developed in the point-to-point setting, where the 
transmit antennas are co-located at a single transmit node, each transmitting one data stream, 
and the receive antennas are co-located at a single receive node, jointly processing the vector of 
received observations at the antennas. A natural approach to apply this concept to the network 
setting is to have both source nodes and destination nodes cooperate in clusters to form distributed 
transmit and receive antenna arrays respectively. In this way, mutually interfering signals can be 
turned into useful ones that can be jointly decoded at the receive cluster and spatial multiplexing 
gain can be realized. In fact, if all the nodes in the network could cooperate for free, then a 
classical MIMO result [11], [12] says that a sum rate scaling proportional to n could be achieved. 
However, this may be over-optimistic : communication between nodes is required to set up the 
cooperation and this may drastically reduce the useful throughput. The Aeron-Saligrama scheme 
is MIMO-based and its performance is precisely limited by the cooperation overhead between 
receive nodes. Our main contribution is to introduce a new multi-scale, hierarchical cooperation 
architecture without significant overhead. Such cooperation first takes place between nodes within 
very small local clusters to facilitate MIMO communication over a larger spatial scale. This can 
then be used as a communication infrastructure for cooperation within larger clusters at the next 
level of the hierarchy. Continuing on this fashion, cooperation can be achieved at an almost 
global scale. 

The result in the dense network builds the foundation for understanding the extended network 
in the low-attenuation regime of the path loss exponent a between 2 and 4. Cooperative MIMO 
communication provides not only a degree of freedom gain but also a power gain, obtained 
by combining signals received at the different nodes. This power gain is not very important in 
the dense setting, since there is already sufficient SNR in any direct communication between 
individual nodes and the capacity is only logarithmic in the SNR. In the extended setting, 
however, this power gain becomes very important, since the power transferred between an 
individual source and destination pair vanishes due to channel attenuation. The operation is 
in the low SNR regime where the capacity is linear in the SNR. Cooperation between nodes 
can significantly boost up the power transfer. In fact, it can be shown that the capacity of long 
range nby n cooperative MIMO transmission scales exactly like the total received power. This 
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total received power scales like n^~"/^. We show that a simple modification of our hierarchical 
cooperation scheme to the extended setting can achieve a network total throughput arbitrarily 
close to this cooperative MIMO scaling. Thus, for a < 3, our scheme performs strictly better 
than multihop. 

Can we do better? Recall that earlier results in [5], [7], [8], [9], [2], [10] are all based on 
upper bounding the amount of power transferred across cutsets of the network. It turns out that 
their upper bounds are tight when a > A but not tight for a between 2 and 4. By evaluating 
exactly the scaling of the power transferred, we show that it matches the performance of the 
hierarchical scheme for a between 2 and 3 and that of the multihop scheme for a > 3. More 
precisely, we obtain the following tight characterization for the scaling exponent for all a in the 
extended case: 



where Cn{Q:) is the total capacity of the network. In particular, when a = 2, linear capacity 
scaling can be achieved, even in the extended case. Note that the capacity is limited by the 
power transferred for all a > 2; hence the extended network is fundamentally coverage-limited, 
even for a between 2 and 4. For a > 3, multihop is sufficient in transferring the optimal amount 
of power; for a < 3, when the attenuation is slower, cooperative MIMO is needed to provide 
the power gain and also enough degrees of freedom to operate in the power-efficient regime. 
Just like in the dense setting, interference limitation does not play a significant role, as far as 
capacity scaling is concerned. Cooperative MIMO takes care of that. 

Our approach to the problem is to first look at the dense case to isolate the issue of interference 
and then to tackle the extended case. But the dense scaling is also of interest on its own right. It is 
relevant whenever one wants to design networks to serve many nodes, all within communication 
range of each other (within a campus, an urban block, etc.). This scaling is also a reasonable 
model to study problems such as spectrum sharing, where many users in a geographical area are 
sharing a wide band of spectrum. Consider the scenario where we segregate the total bandwidth 
into many orthogonal bands, one for each separate network supporting a fixed number of users. 
As we increase the number of users, the number of such segregated networks increases but the 
spectral efficiency, in bits/s/Hz, does not scale with the total number of users. In contrast, if we 
build one large ad hoc network for all the users on the entire bandwidth, then our result says that 
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the spectral efficiency actually increases linearly with the number of users. The gain is coming 
from a network effect via cooperation between the many nodes in the system. 

The rest of the paper is summarized as follows. In Section |IIl we present the model and discuss 
the various assumptions. Section UlI] contains the main result for dense networks and an outline 
of the proposed architecture together with a back-of-the-envelope analysis of its performance. 
The details of its performance analysis are given in Section |IVl Section |V] characterizes the 
scaling law for extended networks. Section |VI] discusses the limitations of the model and the 
results. Section IVIII contains our conclusions. 

IL Model 

There are n nodes uniformly and independently distributed in a square of unit area in the dense 
scaling (Sections Hill and HVT) and a square of area n in the extended scaling (Section jV]). Every 
node is both a source and a destination. The sources and destinations are paired up one-to-one 
in an arbitrary fashion. Each source has the same traffic rate R{n) to send to its destination node 
and a common average transmit power budget of P Watts. The total throughput of the system 
is T{n) = nR{n)^ 

We assume that communication takes place over a flat channel of bandwidth W Hz around a 
carrier frequency of fc ^ W. The complex baseband-equivalent channel gain between node 
i and node k at time m is given by: 

Hik [fn] = VGr;,"-/' expije,,[m]) (1) 

where r^fc is the distance between the nodes, 9ik[m] is the random phase at time m, uniformly 
distributed in [0,27r] and {6ik[m],l < i < n,l < k < n} is a collection of i.i.d. random 
processes. The 6'jfc[m]'s and the rj^'s are also assumed to be independent. The parameters G 
and a > 2 are assumed to be constants; a is called the path loss exponent. For example, under 
free-space line-of-sight propagation, Friis' formula applies and 

I rj r 1|2 Gtx-Grx 

\Hik[m]\ = —2 (2) 

(47rrifc/ Ac) 

'in the sequel, whenever we say a total throughput T{n) is achievable, we implicitly mean that a rate of T(n) /n is achievable 
for every source-destination pair. 
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so that 



G 



Gtx ■ Grx 




a = 2. 



where Gtx and G^^ are the transmitter and receiver antenna gains respectively and Ac is the 
carrier wavelength. 

Note that the channel is random, depending on the location of the users and the phases. 
The locations are assumed to be fixed over the duration of the communication. The phases are 
assumed to vary in a stationary ergodic manner (fast fading)!^ We assume that the channel gains 
are known at all the nodes. The signal received by node i at time m is given by 



k=l 

where Xfe[m] is the signal sent by node k at time m and Zi[m] is white circularly symmetric 
Gaussian noise of variance Nq per symbol. 

Several comments about the model are in order: 

• The path loss model is based on a far-field assumption: the distance is assumed to be 
much larger than the carrier wavelength. When the distance is of the order or shorter than 
the carrier wavelength, the simple path loss model obviously does not hold anymore as path 
loss can potentially become path "gain". The reason is that near-field electromagnetics now 
come into play. 

• The phase 9ik[m] depends on the distance between the nodes modulo the carrier wave- 
length [13]. The random phase model is thus also based on a far-field assumption: we are 
assuming that the nodes' separation is at a much larger spatial scale compared to the carrier 
wavelength, so that the phases can be modelled as completely random and independent of 
the actual positions. 

• It is realistic to assume the variation of the phases since they vary significantly when users 
move a distance of the order of the carrier wavelength (fractions of a meter). The positions 
determine the path losses and they on the other hand vary over a much larger spatial scale. 
So the positions are assumed to be fixed. 

^With more technical efforts, we believe our results can be extended to the slow fading setting where the phases are fixed as 
well. See the remark at the end of Appendix H] for further discussion on this point. 



n 



= X] Hik[m]Xk[m] + Zi[m\ 
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• We essentially assume a line-of-sight type environment and ignore multipath effects. The 
randomness in phases is sufficient for the long range MIMO transmissions needed in our 
scheme. With multipaths, there is a further randomness due to random constructive and 
destructive interference of these paths. It can be seen that our results easily extend to the 
multipath case. 

We will discuss further the limitations of this model in Section IVIl after we present our results. 

III. Main Result for Dense Networks 

We first give an information-theoretic upper bound on the achievable scaling law for the 
aggregate throughput in the network. Before starting to look for good communication strategies, 
Theorem 13.11 establishes the best we can hope for. 

Theorem 3.1: The aggregate throughput in the network with n nodes is bounded above by 

T{n) < K'nlogn 
with high probabilitjO for some constant K' > independent of n. 

Proof: Consider a source-destination pair (s, d) in the network. The transmission rate R{n) 
from source node s to destination node d is upper bounded by the capacity of the single-input 
multiple-output (SIMO) channel between source node s and the rest of the network. Using a 
standard formula for this channel (see eg. [13]), we get: 




It is easy to see that in a random network with n nodes uniformly distributed on a fixed two- 
dimensional area, the minimum distance between any two nodes in the network is larger than 
^^T^ with high probability, for any 5 > 0. Consider one specific node in the network which 
is at distance larger than ^^^^py to all other nodes in the network. This is equivalent to saying 
that there are no other nodes inside a circle of area ^2+2S around this node. The probability of 
such an event is (l — ^2+2s )^ ^ ■ Moreover, the minimum distance between any two nodes in the 

^i.e. probability going to 1 as system size grows. 
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network is larger than ^^i^rj only if this condition is satisfied for all nodes in the network. Thus 

by the union bound we have, 

f I \ { f Tl \^-^ 

P minimum distance in the network is smaller than --—t < n \ 1 — [1 



V Tl^'^^ 1 \ V ^2+2(5 ^ 

which decreases to zero as \/n^^ with increasing n. 

Hence using this fact on the minimum distance in the network, we obtain 

R{n) < log (^1 + ^n"(^+^)+^^ < ir'logn 

for some constant K' > independent of n for all-source destination pairs in the network with 
high probability. The theorem follows. □ 

In the view of what is ultimately possible, established by Theorem 13. 1[ we are now ready to 
state the main result of this paper. 

Theorem 3.2: Let a > 2. For any e > 0, there exists a constant > independent of n 
such that with high probability, an aggregate throughput 

T{n) > K,n^-' 

is achievable in the network for all possible pairings between sources and destinations. 

Theorem 13.21 states that it is actually possible to perform arbitrarily close to the bound given 
in Theorem 13.11 The two theorems together establish the capacity scaling for the network up to 
logarithmic terms. Note how dramatically different is this new linear capacity scaling law from 
the well-known throughput scaling of Q(^/n) implied by [3], [4] for the same model. Note also 
that the upper bound in Theorem 13.11 assumes a genie-aided removal of interference between 
simultaneous transmissions from different sources. By proving Theorem 13. 2[ we will show that 
it is possible to mitigate such interference without a genie but with cooperation between the 
nodes. 

The proof of Theorem 13.21 relies on the construction of an explicit scheme that realizes the 
promised scaling law. The construction is based on recursively using the following key lemma, 
which addresses the case when a > 2. 

Lemma 3.1: Consider a > 2 and a network with n nodes subject to interference from external 
sources. The signal received by node i is given by 

n 

Yi = HikXk + Zi + li 

k=l 



where Jj is the external interference signal received by node i. Assume that {/j, 1 < i < n} is 
a collection of uncorrected zero-mean stationary and ergodic random processes with power Pj- 
upper bounded by 

Pi,<Ki, l<i<n 

for a constant Ki > independent of n. Let us assume there exists a scheme such that for each 
n, with probability at least 1 — e""'^^ achieves an aggregate throughput 

T{n) > Ki 

for every possible source-destination pairing in this network of n nodes. Ki and ci are positive 
constants independent of n and the source-destination pairing, and < 6 < 1. Let us also assume 
that the per node average power budget required to realize this scheme is upper bounded by 
P/n as opposed to P. 

Then one can construct another scheme for this network that achieves a higher aggregate 
throughput 

T{n) > K2 

for every source-destination pairing in the network, where > is another constant indepen- 
dent of n and the pairing. Moreover, the failure rate for the new scheme is upper bounded by 
g-n'^a ^Qj. ^j^Qjj^gj- positive constant C2 while the per node average power needed to realize the 
scheme is also upper bounded by P/n. 

Lemma [311 is the key step to build a hierarchical architecture. Since > b for < 6 < 1, 
the new scheme is always better than the old. We will now give a rough description of how the 
new scheme can be constructed given the old scheme, as well as a back-of-the-envelope analysis 
of the scaling law it achieves. Next section is devoted to its precise description and performance 
analysis. 

The constructed scheme is based on clustering and long-range MIMO transmissions between 
clusters. We divide the network into clusters of M nodes. Let us focus for now on a particular 
source node s and its destination node d. s will send M bits to d in 3 steps: 

(1) Node s will distribute its M bits among the M nodes in its cluster, one for each node; 

(2) These nodes together can then form a distributed transmit antenna array, sending the M 
bits simultaneously to the destination cluster where d lies; 
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(3) Each node in the destination cluster obtained one observation from the MIMO transmission, 
and it quantizes and ships the observation back to d, which can then do joint MIMO 
processing of all the observations and decode the M transmitted bits. 
From the network point of view, all source-destination pairs have to eventually accomplish 
these three steps. Step 2 is long-range communication and only one source-destination pair can 
operate at the same time. Steps 1 and 3 involve local communication and can be parallelized 
across source-destination pairs. Combining all this leads to three phases in the operation of the 
network: 

Phase 1: Setting Up Transmit Cooperation Clusters work in parallel. Within a cluster, each 
source node has to distribute M bits to the other nodes, 1 bit for each node, such that at the end 
of the phase each node has 1 bit from each of the source nodes in the same cluster. Since there 
are M source nodes in each cluster, this gives a traffic demand of exchanging bits. (Recall 
our assumption that each node is a source for some communication request and destination for 
another.) The key observation is that this is similar to the original problem of communicating 
between n source and destination pairs, but on a network of size M. More specifically, this 
traffic demand of exchanging bits is handled by setting up M sub-phases, and assigning 
M source-destination pairs for each sub-phase. Since our channel model is scale invariant, note 
that the scheme given in the hypothesis of the lemma can be used in each sub-phase by simply 
scaling down the power with cluster area. Having aggregate throughput M^, each sub-phase is 
completed in M^^^ time slots while the whole phase takes Af^~* time slots. See Figure 1. 

Phase 2: MIMO Transmissions We perform successive long-distance MIMO transmissions 
between source-destination pairs, one at a time. In each one of the MIMO transmissions , say 
one between s and d, the M bits of s are simultaneously transmitted by the M nodes in its 
cluster to the M nodes in the cluster of d. Each of the long-distance MIMO transmissions are 
repeated for each source node in the network, hence we need n time slots to complete the phase. 
See Figure 2. 

Phase 3: Cooperate to Decode Clusters work in parallel. Since there are M destination nodes 
inside the clusters, each cluster received M MIMO transmissions in phase 2, one intended for 
each of the destination nodes in the cluster. Thus, each node in the cluster has M received 
observations, one from each of the MIMO transmissions, and each observation is to be conveyed 
to a different node in its cluster. Nodes quantize each observation into fixed Q bits so there are 
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Fig. 1. Nodes inside clusters F, G, H and J are illustrated while exchanging bits in Phases 1 and 3. Note that in Phase 1 the 
exchanged bits are the source bits whereas in Phase 3 they are the quantized MIMO observations. Clusters work in parallel. In 
this and the following figure Fig. 2, we highlight three source-destination pairs si — di, S2 — d2 and S3 — dz, such that nodes 
Sl and dz are located in F, nodes S2 and S3 are located in H and J respectively, and nodes di and d2 are located in G. 




Fig. 2. Successive MIMO transmissions are performed between clusters. The first figure depicts MIMO transmission from 
cluster F to G, where bits originally belonging to si are simultaneously transmitted by all nodes in F to all nodes in G. The 
second MIMO transmission is from H to G, while now bits of source node S2 are transmitted by nodes in H to nodes in G. 
The third picture illustrates MIMO transmission from cluster J to F. 



now a total of at most QM"^ bits to exchange inside each cluster. Using exactly the same scheme 
as in Phase 1, we conclude the phase in QM"^"^ time slots. See again Figure 1. 

Assuming that each destination node is able to decode the transmitted bits from its source 
node from the M quantized signals it gathers by the end of Phase 3, we can calculate the rate 
of the scheme as follows: Each source node is able to transmit M bits to its destination node, 
hence nM bits in total are delivered to their destinations in M^~^ + n + QM'^"^ time slots, 
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yielding an aggregate throughput of 

nM 



M2-h + ^ + 

bits per time slot. Maximizing this throughput by choosing Ma-f- yields T[n) = ■^^'^'^'^ 
the aggregate throughput which is the result in Lemma I3.1[ 

Clusters can work in parallel in phases 1 and 3 because for a > 2, the aggregate interference 
at a particular cluster caused by other active nodes is bounded, moreover the interference 
signals received by different nodes in the cluster are zero-mean and uncorrelated satisfying 
the assumptions of Lemma [3]TJ For a = 2, the aggregate interference scales like logn, leading 
to a slightly different version of the lemma. 

Lemma 3.2: Consider a = 2 and a network with n nodes subject to interference from external 
sources. The signal received by node i is given by 

n 
k=l 

where Jj is the external interference signal received by node i. Assume that {Jj, 1 < i < n} is 
a collection of uncorrelated zero-mean stationary and ergodic random processes with power P/. 
upper bounded by 

Pj. < Kj log n, 1 < i < n 

for a constant Kj > and independent of n. Let us assume there exists a scheme such that for 
each n with failure probability at most e^'^"^ , achieves an aggregate throughput 



T(n) > K 



logn 

for every source-destination pairing in this network. Ki and ci are positive constants independent 
of n and the source-destination pairing, and < 6 < L Let us also assume that the average 
power budget required to realize this scheme is upper bounded by P/n, as opposed to P 

Then one can construct another scheme for this network that achieves a higher aggregate 
throughput scaling 

n 2-6 

Tin) > K, 



' (logri)^ 

for every source-destination pairing, where > is another constant independent of n and the 
pairing. Moreover, the failure rate for the new scheme is upper bounded by e""'^^ for another 
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positive constant C2 while the per node average power needed to realize the scheme is also upper 
bounded by P/n. 

We can now use Lemma 13.11 and 13.21 to prove Theorem 13. 2[ 

Proof of Theorem \3?2\ We only focus on the case of a > 2. The case of a = 2 proceeds similarly. 

We start by observing that the simple scheme of transmitting directly between the source- 
destination pairs one at a time (TDMA) satisfies the requirements of the lemma. The aggregate 
throughput is 6(1), so 6 = 0. The failure probability is 0. Since each source is only transmitting 
-th of the time and the distance between the source and its destination is bounded, the average 
power consumed per node is of the order of K 

As soon as we have a scheme to start with. Lemma 13.11 can be applied recursively, yielding 
a scheme that achieves higher throughput at each step of the recursion. More precisely, starting 
with a TDMA scheme with 6 = and applying Lemma 13.11 recursively h times, one gets a 
scheme achieving Q{n~) aggregate throughput. Given any e > 0, we can now choose h such 
that > 1 — e and we get a scheme that achieves 6(n^~'') aggregate throughput scaling with 
high probability. This concludes the proof of Theorem 13. 2[ □ 

Gathering everything together, we have built a hierarchical scheme to achieve the desired 
throughput. At the lowest level of the hierarchy, we use the simple TDMA scheme to exchange 
bits for cooperation among small clusters. Combining this with longer range MIMO transmis- 
sions, we get a higher throughput scheme for cooperation among nodes in larger clusters at the 
next level of the hierarchy. Finally, at the top level of the hierarchy, the cooperation clusters are 
almost the size of the network and the MIMO transmissions are over the global scale to meet 
the desired traffic demands. Figure 3 shows the resulting hierarchical scheme with a focus on 
the top two levels. 

It is important to understand the aspects of the channel model which the scheme made use of 
in achieving the linear capacity scaling: 

• the random channel phases enable the long-range MIMO transmissions. 

• the path attenuation decay law 1/r" (a > 2) ensures that the aggregate signals from far 
away nodes are much weaker than signals from close-by nodes. This enables spatial reuse. 

Note that the second property is exactly the same one which allows multi-hop schemes to 
achieve the Y^-scaling in the paper by Gupta- Kumar [3] and in many others after that. Although 
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Fig. 3. The time division in a hierarchical scheme as well as the salient features of the three phases are illustrated. 



the gain between nearby nodes becomes unbounded as n ^ oo in the model, the received 
signal-to-interference-plus-noise ratio (SINR) is always bounded in the scheme at all levels of 
the hierarchy. The scheme does not communicate with unbounded SINR, although it is possible 
in the model. 



IV. Detailed Description and Performance Analysis 

In this section, we concentrate in more detail on the scheme that proves Lemma 13.11 and 
Lemma 13.21 We first focus on Lemma 13.11 and then extend the proof to Lemma 13.21 As we 
have already seen in the previous section, we start by dividing the unit square into smaller 
squares of area Ac = Since the node density is n, there will be on average M nodes inside 
each of these small squares. The following lemma upper bounds the probability of having large 
deviations from the average. 

Lemma 4.1: Let us partition a unit area network of size n into cells of area where Ac can 
be a function of n. The number of nodes inside each cell is between ((1 — 5)Acn, (1 + S)Acn) 
with probability larger than 1 — -J-e^'^('')^'=" where A(S) is independent of n and satisfies A(5) > 
when 6 > 0. 
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Applying Lemma 14.11 to the squares of area M / n, we see that all squares contain order M 
nodes with probability larger than 1 — j^e^^^^^^^ . We assume AP where < 7 < 1 in which 
case this probability tends to 1 as n increases. In the following discussion, we will need a 
stronger result, namely each of the 8 possible halves of a square should contain order M/2 
nodes with high probability which again follows from the lemma together with the union bound. 
This condition is sufficient for our below analysis on scaling laws to hold. However, in order 
to simplify the presentation, we assume that there are exactly M/2 nodes inside each half, thus 
exactly M nodes in each square. The clustering is used to realize a distributed MIMO system 
in three successive steps: 

Phase 1: Setting Up Transmit Cooperation In this phase, source nodes distribute their data 
streams over their clusters and set up the stage for the long-range MIMO transmissions that we 
want to perform in the next phase. Clusters work in parallel according to the 9-TDMA scheme 
depicted in Figure 4, which divides the total time for this phase into 9 time-slots and assigns 
simultaneous operation to clusters that are sufficiently separated. Let us focus on one specific 
source node s located in cluster S with destination node d in cluster D. Node s will divide a 
block of length LM bits of its data stream into M sub-blocks each of length L bits, where L 
can be arbitrarily large but bounded. The destination of each sub-block in Phase 1 depends on 
the relative position of clusters S and D: 

(1) If S" and D are either the same cluster or are not neighboring clusters: One sub-block is 
to be kept in s and the rest M — 1 sub-blocks are to transmitted to the other M — 1 nodes 
located in S, one sub-block for each node. 

(2) If S and D are neighboring clusters: Divide the cluster S into two halves, each of area 
Ac/ 2, one half located close to the border with D and the second half located farther to 
D. The M sub-blocks of source node s are to be distributed to the M/2 nodes located in 
the second half cluster (farther to D), each node gets two sub-blocks. 

Since the above traffic is required for every source node in cluster S, we end up with a 
highly uniform traffic demand of delivering M x LM bits in total to their destinations. A key 
observation is that the problem can be separated into sub-problems, each similar to our original 
problem, but on a network size M and area Ac. More specifically, the traffic of transporting 
LM^ bits can be handled by organizing M sessions and assigning M source-destination pairs 
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for each session. (Note that due to the non-uniformity arising from point (2) above, one might 
be able to assign only M/2 source-destination pairs in a session and hence need to handle the 
traffic demand of transporting LM^ bits by organizing up to 2M sessions in the extreme case 
instead of M.) The assigned source-destination pairs in each session can then communicate L 
bits. Since our channel model is scale invariant, the scheme in the hypothesis of Lemma [STTI can 
be used to handle the traffic in each session, by simply scaling down the powers of the nodes by 
Hence, the power used by each node will be bounded above by ^-^^ — . The scheme 
is to be operated simultaneously inside all the clusters in the 9-TDMA scheme, so we need to 
ensure that the resultant inter-cluster interference satisfies the properties in Lemma 13.11 

Lemma 4.2: Consider clusters of size M and area Ac operating according to 9-TDMA scheme 

P(A )^^-^ 

in Figure 4 in a network of size n. Let each node be constrained to an average power ^ — . 
For a > 2, the interference power received by a node from the simultaneously operating clusters 
is upper bounded by a constant Kj-^ independent of n. For a = 2, the interference power is 
bounded by Kj^ logn for Kj^ independent of n. Moreover, the interference signals received by 
different nodes in the cluster are zero-mean and uncorrected. 

Let us for now concentrate on the case a > 2. By Lemma |431 the inter-cluster interference has 
bounded power and is uncorrected across different nodes. Thus, the strategy in the hypothesis 
of Lemma 13.11 can achieve an aggregate rate KiM^ in each session for some Ki > 0, with 
probability larger than l—e'^^^"^ . Using the union bound, with probability larger than 1— 2r2e^^^'^^ , 
the aggregate rate KiM^ is achieved inside all sessions in all clusters in the network. (Recall 
that the number of sessions in one cluster can be 2M in the extreme case and there are n/M 
clusters in total.) With this aggregate rate, each session can be completed in at most {L/ Ki)M^~'^ 
channel uses and 2M successive sessions are completed in (2L/ Ki)M'^~^ channel uses. Using 
the 9-TDMA scheme, the phase is completed in less than (18L/Ki)M^^^ channel uses all over 
the network with probability larger than 1 — 2ne^^^'^^ . 

Phase 2: MIMO Transmissions In this phase, we are performing the actual MIMO trans- 
missions for all the source-destination pairs serially, i.e. one at a time. A MIMO transmission 
from source s to destination d involves the M (or M/2) nodes in the cluster S, where s is in 
(referred to as the source cluster for this MIMO transmission) to the M (or M/2) nodes of the 
cluster D, where d is in (referred to as the destination cluster of the MIMO transmission). 

17 




Fig. 4. Buffers of the nodes in a cluster are illustrated before and after the data exchanges in Phase 1. The data stream of the 
source nodes are distributed to the M nodes in the network as depicted. hs{j) denotes the j'th sub-block of the source node s. 
Note the 9-TDMA scheme that is employed over the network in this phase. 



Let the distance between the mid-points of the two clusters be tsd- If S and D are the 
same cluster, we skip the step for this source-destination pair s — d. Otherwise, we operate in 
two slightly different modes depending on the relative positions of S and D Each mode is a 
continuation of the operations performed in the first phase. First consider the case where S and 
D are not neighboring clusters. In this case, the M nodes in cluster S independently encode the 
L bits-long sub-blocks they possess, originally belonging to node s, into C symbols by using a 
randomly generated Gaussian code C that respects an average transmit power constraint . 
The nodes then transmit their encoded sequences of length C symbols simultaneously to the 
M nodes in cluster D. The nodes in cluster D properly sample the signals they observe during 
the C transmissions and store these samples (that we will simply refer to as observations in 
the following text), without trying to decode the transmitted symbols. In the case where S and 
D are neighbors, the strategy is slightly modified so that the MIMO transmission is from the 
M/2 nodes in S, that possess the sub-blocks of s after Phase 1, to the M/2 nodes in D that 
are located in the farther half of the cluster to S. Each of these M/2 nodes in S possess two 
sub-blocks that come from s. They encode each sub-block into C symbols by again using a 
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Gaussian code of power . The nodes then transmit the 2C symbols to the M/2 nodes 

in D that in turn sample their received signals and store the observations. The observations 
accumulated at various nodes in D at the end of this step are to be conveyed to node d during 
the third phase. 

After concluding the step for the pair s — d, the phase continues by repeating the same step 
for the next source node s + 1 in 5" and its destination d'. Note that the destination cluster for 
this new MIMO transmission is, in general, a different cluster D', which is the one that contains 
the destination node d'. The MIMO transmissions are repeated until the data originated from 
all source nodes in the network are transmitted to their respective destination clusters. Since the 
step for one source-destination pair takes either C or 2C channel uses, completing the operation 
for all n source nodes in the network requires at most 2C x n — 2Cn channel uses. 

Phase 3: Cooperate to Decode In this phase, we aim to provide each destination node, 
the observations of the symbols that have been originally intended for it. With the MIMO 
transmissions in the second phase, these observations have been accumulated at the nodes of its 
cluster. As before, let us focus on a specific destination node d located in cluster D. Note that 
depending on whether the source node of d is located in a neighboring cluster or not, either 
each of the M nodes in D have C observations intended for d, or M/2 of the nodes have 2C 
observations each. Note that these observations are some real numbers that need to be quantized 
and encoded into bits before being transmitted. Let us assume that we are encoding each block 
of C observations into CQ bits, by using fixed Q bits per observation on the average. The 
situation is symmetric for all M destination nodes in D, since the cluster received M MIMO 
transmissions in the previous phase, one for each destination node. (The destination nodes that 
have source nodes in D are exception. Recall from Phase 1 and Phase 2 that in this case, each 
node in D possesses sub-blocks of the original data stream for the destination node, not MIMO 
observations. We will ignore this case by simply assuming L < CQ in the below computation.) 
The arising traffic demand of transporting MxCQM bits in total is similar to Phase 1 and can be 
handled by using exactly the same scheme in less than (2CQ/ Ki)M'^~'' channel uses. Recalling 
the discussion on the first phase, we conclude that the phase can be completed in less than 
{18CQ/Ki)M'^~^ channel uses all over the network with probability larger than 1 — 2ne~^''^ . 

Note that if it were possible to encode each observation into fixed Q bits without introducing 
any distortion, which is obviously not the case, the following lemma on MIMO capacity would 
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suggest that with the Gaussian code C used in Phase 2 satisfying L/C > k for some constant 
K > 0, the transmitted bits could be recovered by an arbitrarily small probability of error from 
the observations gathered by the destination nodes at the end of Phase 3. 

Lemma 4.3: The mutual information achieved by the M x M MIMO transmission between 
any two clusters grows at least linearly with M. 

The following lemma states that there is actually a way to encode the observations using 
fixed number of bits per observation and at the same time, not to degrade the performance of 
the overall channel significantly, that is, to still get a linear capacity growth for the resulting 
quantized MIMO channel. 

Lemma 4.4: There exists a strategy to encode the observations at a fixed rate Q bits per 
observation and get a linear growth of the mutual information for the resultant M x M quantized 
MIMO channel. 

We leave the proof of the lemma to Appendix HI] however the following small lemma may 
provide motivation for the stated result. Lemma 14.51 points out a key observation on the way 
we choose our transmit powers in the MIMO phase. It is central to the proof of Lemma 14.41 
and states that the observations have bounded power, that does not scale with M. This in turn 
suggests that one can use a fixed number of bits to encode them without degrading the scaling 
performance of the scheme. 

Lemma 4.5: In Phase 2, the power received by each node in the destination cluster is bounded 
below and above by constants Pi and P2 respectively that are independent of M. 

Putting it together, we have seen that the three phases described effectively realize virtual 
MIMO channels achieving spatial multiplexing gain M between the source and destination 
nodes in the network. Using these virtual MIMO channels, each source is able to transmit ML 
bits in 

Tt = T(phase 1) + T(phase 2) + T(phase 3) 

= ^M'-' + 2Cn + ^M'-' 
Ki Ki 
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total channel uses where L/C > k for some k > independent of M (or n). This gives an 
aggregate throughput of 

nML 



Tn) 



ylSL/Ki)M^-^ + 2Cn + {ISCQ / Ki)M^-^ 
> K2n^b (3) 

1 

for some K2 > independent of n, by choosing M^-b with < 6 < 1, which is the optimal 
choice for the cluster size as a function of b. A failure arises if there are not order M/2 nodes in 
each half cluster or the scheme used in Phases 1 and 3 fails to achieve the promised throughput. 
Combining the result of Lemma 14.11 with the computed failure probabilities for Phases 1 and 3 
yields 

^ - M 

for some C2 > 0. 

Next, we show that per node average power used by the new scheme is also bounded above 
by P/n: for Phases 1 and 3, we know that the scheme employed inside the clusters uses average 
per node power bounded above by PAc^'^ /M. Indeed, Ac = M/n, and for a > 2 we have 

PA?/' p_ /My/' _ p /My/'"' ^ p 



M M \ n J n \ n J n 

In Phase 2, each node is transmitting with power ^^^If^ in at most fraction M/n of the total 
duration of the phase, while keeping silent during the rest of the time. This yields a per node 
average power ^^Il^al-. Recall that is the distance between the mid-points of the source and 
destination clusters and rsn < 1, which yields the upper bound P/n on the per- node average 
power also for the second phase. 

In order to conclude the proof of Lemma 13.11 we should note that the new scheme achieves 
the same aggregate throughput scaling when the network experiences interference from the 
exterior. In phases 1 and 3, this external interference with bounded power will simply add to 
the inter-cluster interference experienced by the nodes. For the MIMO phase, this will result in 
uncorrected background-noise-plus-interference at the receiving nodes which is not necessarily 
Gaussian. In Appendix H] and HI] we prove the results stated in Lemma 14.31 and Lemma 14.41 for 
this more general case. This concludes the proof of Lemma 13.11 □ 

21 



Proof of Lemma \3.2\ The scheme that proves Lemma 13.21 is completely similar to the one 
described above. Lemma 14.21 states that when a = 2, the inter-cluster interference power 
experienced during Phases 1 and 3 is upperbounded by Ki^ log n = log M. From the 
assumptions in the lemma, there is furthermore the external interference with power bounded by 
Kilogn that is adding to the inter-cluster interference. Under these conditions, the scheme in 
the hypothesis of Lemma |X2l achieves an aggregate rate -^ijj^^j when used to handle the traffic 
in these phases. For the second phase we have the following lemma which provides a lower 
bound on the spatial multiplexing gain of the quantized MIMO channel under the interference 
experienced. 

Lemma 4.6: Let the MIMO signal received by the nodes in the destination cluster be corrupted 
by an interference of power Ki log M, uncorrelated over different nodes and independent of the 
transmitted signals. There exists a strategy to encode these corrupted observations at a fixed rate 
Q bits per observation and get a M/ log M growth of the mutual information for the resulting 
M X M quantized MIMO channel. 

A capacity of M/ log M for the resulting MIMO channel implies that there exists a code C that 
encodes L bits-long sub-blocks into ClogM symbols, where L/C > for a constant k' > 0, so 
that the transmitted bits can be decoded at the destination nodes with arbitrarily small probability 
of error for L and C sufficiently large. Hence, starting again with a block of LM bits in each 
source node, the LM^ bits in the first phase can be delivered in {L/Ki)M'^~^logM channel 
uses. In the second phase, the L bits-long sub-blocks now need to be encoded into ClogM 
symbols, hence the transmission for each source-destination pair takes ClogM channel uses, 
the whole phase taking Cn log M channel uses. Note that there are now CM^ logM observations 
encoded into CQM"^ log M bits that need to be transported in the third phase. With the scheme of 
aggregate rate -f^ijj^^, we need (CQ//^i)M^"^(! logM)^ channel uses to complete the phase. 
Choosing , gives an aggregate throughput of K2n^ / (lognY for the new scheme. This 
concludes the proof of Lemma I3.2[ □ 

We continue with the proofs of the lemmas introduced in the section: 

Proof of Lemma \4J\ The proof of the lemma is a standard application of Chebyshev's inequality. 
Note that the number of nodes in a given cell is a sum of n i.i.d Bernoulli random variables Bi, 
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such that F{Bi = 1) = Ac. Hence, 

P j > (1 + 6)Acn j = P (^e^J^-i^' > e^(i+5M=«) 



.i=l 



= (eM,+ (1- AJ)"e-^(i+'')^^" 

^ g-Acn(s(l+5)-e''+l) 
^ g-AcnA+{5) 

where A+((5) = (1 + 5) log(l + 5) — 5 hy choosing s = log(l + 5). The proof for the lower 
bound follows similarly by considering the random variables — -Bj. The conclusion follows from 
the union bound. □ 

Proof of Lemma \4.2\ Consider a node v in cluster V operating under the 9-TDMA scheme in 
Figure 5. The interfering signal received by this node from the simultaneously operating clusters 
Uy is given by 

/„ = ^ ^ Hyj Xj 

where H^j are the channel coefficients given by ([T]) and Xj is the signal transmitted by node j 
which is located in a simultaneously operating cluster U . First note that the signals and /„/ 
received by two different nodes v and v' in V are uncorrected since the channel coefficients 
H^j and H^ij are independent for all j. The power of the interfering signal 1^ is given by 

GPi 



by using the fact that channel coefficients corresponding to different nodes j are independent. 
As illustrated by Figure 5, the interfering clusters Uy can be grouped such that each group V(y{i) 
contains 8i clusters or less and all clusters in group V(y{i) are separated by a distance larger 
than {3i — l)y/A^ from V for i = 1,2,... where Ac is the cluster area. The number of such 
groups can be simply bounded by the number of clusters n/M in the network. Thus, 

n/M 



^=l ueUv{i)j€U ((3^- 1)VA)" 
<y8z , (4) 
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Fig. 5. Grouping of interfering clusters in the 9-TDMA Scheme. 



where we have used the fact that the powers of the signals are bounded by Pj < P Ac^"^ /M, Vj. 
The sum in (HI) is convergent for a > 2, thus is bounded by a constant Kj-^ . For a = 2, the sum 
can be bounded by Kj^ \ogn where Kj^ is a constant independent of n. 

Proof of Lemma 14.51 - We consider only the case where the source cluster S and the destination 
cluster D are not neighbors. The argument for the other case follows similarly. The signal 
received by a destination node d located in cluster D during MIMO transmission from source 
cluster S is given by 



M 
s=l 



where Xg is the signal sent by a source node s G S* constrained to power ^^^1°^ and Zd is 
~ A/c(0, A^o)- The power of this signal is given by 



E[|y.n=f;iff,.p^ + iv„ 



s=l 
M 



s=l 



where we use the fact that all Hds, Xg and Zd are independent. Observe that rgo ~ y2Ac < 



'^sd ^ '^SD + \j2Ai., while tsd ^ 2\l~Ac. These two relations yield 



V2 + 1/ \r,d) 
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which in turn yields the following lower and upper bounds for the received power at each 
destination node 

Pi = (^Tf^) GP + No<E [\Y,\'] < (^^7=^) GP + No^ P^. (5) 

□ 

V. Extended Networks 
A. Bursty Hierarchical Scheme does better than Multihop for a < 3 

So far, we have considered dense networks, where the total geographical area is fixed and the 
density of nodes increasing. Another natural scaling is the extended case, where the density of 
nodes is fixed and the area is increasing, a -Jn x ^/n square. This models the situation where 
we want to scale the network to cover an increasing geographical area. 

As compared to dense networks, the distance between nodes is increased by a factor of ^/n, 
and hence for the same transmit powers, the received powers are all decreased by a factor of 
n"/^. Equivalently, by rescaling space, an extended network can just be considered as a dense 
network on a unit area but with the average power constraint per node reduced to P/ n"/^ instead 
of P. 

Lemmas 13.11 and 13.21 state that the average power per node required to run our hierarchical 
scheme in dense networks is not the full power P but P/n. In light of the observation above, 
this immediately implies that when a = 2, we can directly apply our scheme to extended 
networks and achieve a linear scaling. For extended networks with a > 2, our scheme would 
not satisfy the equivalent power constraint Pjn'^l'^ and we are now in the power-limited regime 
(as opposed to the degrees-of-freedom limited regime). However, we can consider a simple 
"bursty" modification of the hierarchical scheme which runs the hierarchical scheme a fraction 

1 



^a/2-l 

of the time with power Pjn per node and remains silent for the rest of the time. This meets the 
given average power constraint of P/n"/^, and achieves an aggregate throughput of 

1 



n 



1 e ^ ^2 a/2 e blts/sCCOnd. 



^a/2-1 

Note that the quantity r?~"'l'^ = ■ can be interpreted as the total power transferred 

between a size n transmit cluster and a size n receive cluster, node pairs in all, with a power 

25 



attenuation of n^"/^ for each node pair. This power transfer is taking place at the top level of the 
hierarchy (see Figure 3). The fact that the achievable rate is proportional to the power transfer 
further emphasizes that our scheme is power-limited rather than degrees-of-freedom limited in 
extended networks. 

Let us compare our scheme to multihop. For a; < 3, it performs strictly better than multihop, 
while for a; > 3, it performs worse. Summarizing these observations, we have the following 
achievability theorem for extended networks, the counterpart to Theorem 13.21 for dense networks. 

Theorem 5.1: Consider an extended network on a ^/n x ^/n square. There are two cases. 

• 2 < a < 3: For every e > 0, with high probability, an aggregate throughput: 

is achievable in the network for all possible pairings between sources and destinations. 
K > is a constant independent of n and the source-destination pairing. 

• a > 3: With high probability, an aggregate throughput: 

T{n) > 

is achievable in the network for all possible pairings between sources and destinations. 
K > is a constant independent of n and the source-destination pairing. 

Note that because of the bursty transmission strategy, the hierarchical scheme has a high peak- 
to-average power ratio. However, although we talk in terms of time in the above discussion, 
such burstiness can just as well be implemented over frequency with only a fraction of the total 
bandwidth W used. For example, this can be implemented in an OFDM system, using a subset 
of the sub-carriers at any one time, but putting more energy in the active sub-carriers. This way, 
the peak power remains constant over time. 

B. Cutset Upper Bound for Random S-D Pairings 

Can we do better than the scaling in Theorem 15.11 ? So far we have been considering arbitrary 
source-destination pairings but clearly there are some pairings for which a much better scaling 
can be achieved. For example, if the source-destinations are all nearest neighbor to each other, 
then a linear capacity scaling can be achieved for any a. Thus, for the extended network case, we 
need to narrow down the class of S-D pairings to prove a sensible upper bound. In this section, 
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we will therefore focus on random S-D pairings, assuming that the pairs are chosen according to 
a random permutation of the set of nodes, without any consideration on node locations. We prove 
a high probability upper bound that matches the achievability result in Theorem |5.1[ within a 
polynomial factor of arbitrarily small exponent. Theorem [5]2] together with Theorem 15.11 identify 
therefore the capacity scaling law in extended networks for all values of a > 2. The rest of the 
section is devoted to the proof of the theorem. 

Theorem 5.2: Consider an extended network of n nodes with random source-destination pair- 
ing. For any e > 0, the aggregate throughput is bounded above by 



with high probability for a constant K' > Q independent of n. 

Note that the hierarchical scheme is achieving near global cooperation. In the context of 
dense networks, this yields a near linear number of degrees of freedom for communication. In 
the context of extended networks, in addition to the degrees of freedom provided, this scheme 
allows almost all nodes in the network to cooperate in transferring energy between any source- 
destination pair. In fact, we saw that in extended networks with a > 2, our scheme is power- 
limited rather than degrees-of-freedom limited. A natural place to look for a matching upper 
bound is to consider a cutset bound on how much power can flow across the network. Our proof 
of Theorem 15.21 is a careful evaluation of such a cutset bound. 

Proof of Theorem \5.2\ We start by considering several properties that are satisfied with high 
probability in the random network. The following lemma is similar in spirit to Lemma 14.11 for 
dense networks and can be proved using a similar technique. In parallel to the dense case, it 
forms the groundwork for our following discussion. 

Lemma 5.1: The random network with random source-destination pairing satisfies the follow- 
ing properties with high probability: 

a) Let the network area be divided into n squarelets of unit area. Then, there are less than 
logn nodes inside all squarelets. 

b) Let the network area be divided into squarelets each of area 2 logn. Then, there is at 
least one node inside all squarelets. 
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Fig. 6. The cut-set considered in the proof of Theorem 15.21 The communication requests that pass across the cut from left to 
right are depicted in bold lines. 



c) Consider a cut dividing the network area into two equal halves. The number of commu- 
nication requests with sources on the left-half network and destinations on the right-half 
network is between ((1 — 5)n/A, (1 + 5)n/A), for any 5 > 0. 
We consider a cut dividing the ^/n x ^/n network area into two equal halves (see Figure 6). 
We are interested in bounding above the sum of the rates of communications passing through 
the cut from left to right. By Part (c) of the lemma, this sum-rate is equal to 1/4'th of the 
total throughput T{n) with high probability. The maximum achievable sum-rate between these 
source-destination pairs is bounded above by the capacity of the MIMO channel between nodes 
S located to the left of the cut and nodes D located to the right. Under the fast fading assumption, 
we have 

Rik< max E {log det{I + HQ{H)H*)) , (6) 

keS,i£D E(Qfefe (//))<>, VfceS 

where 

Hik = — , k e S,i e D. 

Q{-) is a mapping from the set of possible channel realizations H to the set of positive semi- 
definite transmit covariance matrices. The diagonal element Qkk{H) corresponds to the power 
allocated to the /cth node at channel state H. A natural way to upper bound Q is by relaxing the 
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individual power constraint to a total transmit power constraint. In the present context however, 
this is not convenient: some nodes in S are close to the cut and some are far apart, so the impact 
of these nodes on the system performance is quite different. A total transmit power constraint 
allows the transfer of power from the nodes far apart to those nodes that are close to the cut, 
resulting in a loose bound. Instead, we will relax the individual power constraints to a total 
weighted power constraint, where the weight assigned to a node is set to be the total received 
power on the other side of the cut per watt of transmit power from that node. However, before 
doing that, we need to isolate the contribution of some nodes in D that are located very close 
to the cut. Typically, there are few nodes on both sides of the cut that are located at a distance 
as small as order from the cut. If included, the contribution of these few pairs to the total 
received power would be excessive, resulting in a loose bound in the discussion below. 

Let Vd denote the set of nodes located on the 1 x ^Jn rectangular area immediately to the right 
of the cut. Note that there are no more than ^/n\ogn nodes in Vd by Part (a) of Lemma [5!T1 
By generalized Hadamard's inequality, we have 

logdet (/ + HQ{H)H*) < logdet (/ + H^^^Q{H)H^^>) + logdet (J + H^^^Q{H)H^^>) 

where H^^^ and H^"^^ are obtained by partitioning the original matrix H: H^^^ is the rectangular 
matrix with entries Hik, k E S,i E Vd and H^"^^ is the rectangular matrix with entries Hik, k E 
S,i E D \ Vd- In turn, Q is bounded above by 

V Rik< max E (logdet (J + ifWg i/W*)) 

^{Qkk{H^^^))<Pykes 

+ max E(logdet(/ + i/(2)g(^(2)^^(2)*^^_ 

E(Qfcfc(/f(2)))<p,vfce5 

The first term in ([7]) can be easily upperbounded by applying Hadamard's inequality once more or 
equivalently by considering the sum of the capacities of the individual MISO channels between 
nodes in S and each node in Vd- A discussion similar to the proof of Theorem 13.11 that makes 
use of the fact that the minimum distance between any two nodes in the network is larger than 
^iji+s with high probability for any 5 > 0, yields the following upper bound for the first term 

max E (logdet (J + H^^^Q (H'-^^) H^^>)) < K' ^{\ognf 

q{hW)>o 

E{Qkk{H'^'^^))<p,ykes 

29 



where i^' > is a constant independent of n. 

The second term in (|7]) is the capacity of the MIMO channel between nodes in S and nodes 
in D\Vd- This is the term that dominates in (|7]) and thus its scaling determines the scaling of 
The result is given by the following lemma, which completes the proof of Theorem 15.21 

Lemma 5.2: Let Ptot{n) be the total power received by all the nodes m D \ Vd, when nodes 
in S are transmitting independent signals at full power. Then for every e > 0, 



max E (logdet (/ + H^'^'^Q (H^'^^) H^^^*)) < n'Pt 

q(h(2))>o ^ ^ \ ! JJ 

^{Qkk{H'^^^))<P,yk(iS 

Moreover, the scaling of the total received power can be evaluated to be 

K' n{\ognY a = 2 



tot 



n] 



Ptntin) < I 



fs:'n2-"/2(logn)2 2<a<3 



K' (logn)^ 
i/n (logn)^ 

with high probability for a constant K' > Q independent of n 



a = 3 
a > 3 



□ 



Lemma 15.21 says two things of importance. First, it says that independent signaling at the 
transmit nodes is sufficient to achieve the cutset upper bound, as far as scaling is concerned. 
There is therefore no need, in order for the transmit nodes to cooperate, to do any sort of 
transmit beamforming. This is fortuitous since our hierarchical MIMO performs only indepen- 
dent signalling across the transmit nodes in the long-range MIMO phase. Second, it identifies 
the total received power under independent transmissions as the fundamental quantity limiting 
performance. Depending on a, there is a dichotomy on how this quantity scales with the system 
size. This dichotomy can be interpreted as follows. 

The total received power is dominated either by the power transferred between nodes near the 
cut (order 1 distance) or by the power transferred between nodes far away from the cut. There 
are relatively fewer node pairs near the cut than away from the cut (order -Jn versus order 
n^), but the channels between the nodes near the cut are considerably stronger than between the 
nodes far away from the cut. When the attenuation parameter a is less than 3, the received power 
is dominated by transfer between nodes far away from the cut. The hierarchical scheme, which 
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involves at the top level of the hierarchy MIMO transmissions between clusters of size n 
at distance apart, achieves arbitrarily closely the required power transfer and is therefore 
optimal in this regime. When a > 3, the received power in the cutset bound is dominated by 
the power transfer by the nodes near the cut. This can be achieved by nearest neighbor multihop 
and multihop is therefore optimal in this regime. 

It should be noted that earlier works identified thresholds on a above which nearest neighbor 
multihop is order-optimal (a > 6 first established in [5], then subsequently refined to hold for 
a > 5 in [7], a > 4.5 in [10] and a > 4 in [2]). All of them essentially use the same cutset 
bound as we did. The fact that they did not identify the tightest threshold (which we are showing 
to be 3) is because their upper bounds on the cutset bound are not tight. 

Proof of Lemma \5.2\ We are interested in the scaling of the MIMO capacity, 



Let us rescale each column k of the matrix by the (square root of the) total received power 
on the right from source node k on the left. Let indeed Pk denote the total received power in 
D \ Vd of the signal sent by user k E S: 



q(h(2))>o 

]E(Qfcfc(H(2)))<P,Vfce5 



max 



E(logdet (/ + i7(2)g(^(2))^(2)*))_ 



(8) 



Pk = PG J2 



r. 



ik 



—a 



PGdk. 



i<^D\VD 



The expression ([8]) is then equal to 



q{h)>o 

E(0fcfc(^))<Pfc,Vfce5 



max 




where 



1 




Hik 




The above expression is in turn bounded above by 




where Ptot{n) = Ekes = P GJ2kes,^eD\Vn ''ik'"- 



Let us now define, for given n > 1 and e > 0, the set 



5, 



{\\Hr>n^}, 
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where \\A\\ denotes the largest singular value of the matrix A. Note that the matrix H is better 
conditioned than the original channel matrix H^'^^ : all the diagonal elements of HH* are roughly 
of the same order (up to a factor logn), and it can be shown that there exists K[ > such that 

E(ll^f) < K[{\ognf 

for all n. In Appendix |llll we show the following more precise statement. 

Lemma 5.3: For any e > and p > 1, there exists K[ > such that for all n, 

nsn.) < §. 

It follows that 



QiH)>0 
E(TrQ(H))<Ptot(n) 



max E flogdet(/ + HQ{H)H*)) 

< max E flogdet(J + HQ{H)H*) 1b„ 

Q{H)>0 V 
E(TrQ{H))<Pt„t(n) 

+ max e(tv{HQ{H)H*)1bc^) (9) 

Q{H)>0 V ' / 

E(TvQiH))<Ptotin) 

The first term in ^ refers to the event that the channel matrix H is accidentally ill-conditioned. 
Since the probability of such an event is polynomially small by Lemma 15. 3[ the contribution of 
this first term is actually negligible. In the second term in (|9]), the matrix H is well conditioned, 
and this term is actually proportional to the maximum power transfer from left to right. Details 
follow below. 

For the first term in we use Hadamard's inequality and obtain 

max E(\ogdet{I + HQ{H)H*)lB„?\ 

Q{H)>0 V ■ / 

¥.{TrQ{H))<Ptot{n) 



< max E y log(l + HiQ{H)H*) 
Q{H)>o \ J-^ 

m{JvQ{H))<Ptot{n) V'^^\''D 



Bn,e lP(-Bn,£ 
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where Hi is the i*^ row of H. By Jensen's inequality, this expression in turn is bounded above 
by 



max V log(l+E(||^,f TrQ(if) fi„,,))p(5„,,) 

E{TvQ{H))<Ptotin) ''^'^^"o 

max V logfl + Ef||if,f Trg(if)) /P(5„,,))P(5„ 



< 

E{TrQ{H))<Ptotin) ""^"^^ 

< nlog 1 + ' 



smce 



fces ^ fees 

The fact that the minimum distance between the nodes in S and D \Vd is, at least 1 yields 
Ptot{n) < PGn^. Noting that x ^ a;log(l + 1/x) is increasing on [0, 1] and using Lemma [53l 
we obtain finally that for any p>l, there exists K[> Q such that 



max 

Q{H)>0 
V(TrQ{H))<Ptotin) 



E(logdet(/ + ^Q(^)^*)lB„,) <K[n'-P log (^1 + , 



which decays polynomially to zero with arbitrary exponent as n tends to infinity. 
For the second term in ([91), we simply have 

max E(Tr:{HQ{H)H*)lBc ] < max E (\\Hfi:rQ{H) Ibc 

Q{H)>0 V Q{H)>0 V 

E{TvQ{H))<Ptot{n) E{TvQ{H))<Ptot{n) 

<n'Ptot{n). 

The last thing that needs therefore to be checked is the scaling of Ptot{n) stated in Lemma [5^ 



Let us divide the network area into n squarelets of area 1. By Part (a) of Lemma [5TT1 there 
are no more than logn nodes in each squarelet with high probability. Let us consider grouping 
the squarelets of S into ^/n rectangular areas of height 1 and width ^/n as shown in Figure 
8. Thus, S = 'S'm- We are interested in bounding above 

PtoM) = PGY,dk = PGY,Yl ^'^^ 

keS m=l k&Sm 
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Fig. 7. The displacement of the nodes inside the squarelets to squarelet vertices, indicated by arrows. 



Let us consider 

E^'^= E ^^fe" (10) 

for a given m. Note that if we move the points that lie in each squarelet of Sm together with the 
nodes in the squarelets of D\Vd onto the squarelet vertex as indicated by the arrows in Figure 
7, all the (positive) terms in the summation in (fTOl) can only increase since the displacement can 
only decrease the Euclidean distance between the nodes involved. Note that the modification 
results in a regular network with at most log n nodes at each squarelet vertex on the left and at 
most 2 log n nodes at each squarelet vertex on the right. Considering the same reasoning for all 
rectangular slabs 5^, , m = 1, . . . , y/n allows to conclude that Ptot{n) for the random network 
is with high probability less than the same quantity computed for a regular network with log n 
nodes at each left-hand side vertex and 2 log n nodes at each right-hand side vertex. 

The most convenient way to index the node positions in the resulting regular network is to 
use double indices. The left-hand side nodes are located at positions {—k^ + 1, ky) and those on 
the right at positions (i^., iy) where k^., ky, ix,iy = 1, ■ ■ ■ , ^Jn, so that 



H.. 



ik 



{{t, + k,- 1)2 + {^y - kyfy/^ ./dizj:, 

and 

"^'-"'^ ^^_,ii^. + K-lr + i^y-kyyr/' ^^^^ 
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which yields the following upper bound for Ptot{n) of the random network, 

Ptot{n)<2{\ognfPG 4.,fc,. (12) 

~1 

The following lemma establishes the scaling of dk^^ky defined in (fTTl) . 

Lemma 5.4: There exist constants > independent of kx, ky and n such that 



dh h ^ 



K2 log n if a = 2, 
K'2 A;2-" if a > 2, 



and 



dk.,ky>K',kl-" for a>2. 



The rigorous proof of the lemma is given at the end of Appendix |III1 A heuristic way of 
thinking about the approximation 

"x 

can be obtained through Laplace's principle. The summation in dk^^ky scales the same as the 
maximum term in the sum times the number of terms which have roughly this maximum value. 
The maximum term is of the order of l/k^. The terms that take on roughly this value are those 
for which runs from 1 to the order of k^ and iy runs from ky to ky plus or minus the order 
of kx- There are roughly k^ such terms. Hence dk^^ky ~ 1/^" " = 
We can now use the upper bound given in the above lemma to yield: 



J2 ^^-^^ - ' 



K'^ n log n if a = 2 



K'^ if 2 < a < 3, 



K'^ y^logn if a = 3, 
K'^ a/w if a > 3 

for another constant K'^> {\ independent of n. This upper bound combined with (fT2)) completes 
the proof of Lemma 15.21 □ 

VL Discussions on the Model and the Results 
In this section, we point out the scope and limitations of the model and the results. 
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A. Scaling Laws vs Performance Analysis 



We should emphasize that the focus in this paper, as well as in [3] and the follow-up works, is 
on scaling laws, i.e. scaling of the aggregate throughput in the limit when the number of users gets 
large. The main advantage of studying scaling laws is to highlight qualitative and architectural 
properties of the system without getting bogged down by too many details. For example, the 
linear scaling law in dense networks we derived highlights the fact that interference limitation in 
the Gupta-Kumar scaling is not a fundamental one and can be alleviated by more complicated 
physical layer processing. 

It is important to distinguish between such scaling law study and the design and performance 
analysis of a scheme for a network with a given number of users. While scaling law results 
provide some architectural guidelines on how to design schemes that scale well, detailed design 
and performance analysis would involve tuning of many parameters and improvements of the 
scheme to optimize the pre-constant in the system throughput. For example, our scheme quantizes 
the received analog signal at each node and forward the bits to the final destination, but the 
quantized bits are correlated across the receive nodes and hence a reduction in the overhead can 
be achieved by doing some Slepian-Wolf coding. Such work is beyond the scope of the present 
paper. 

We studied two different scaling laws in this paper, one for dense and one for extended 
networks. Given a network with a specific number of nodes occupying a specific area, a natural 
question is: is this network best described by the dense scaling regime or the extended scaling 
regime? What our results say is that a better delineation is in terms of whether we are in 
the degree-of-freedom limited or power(coverage)-limited regime, because this is what will 
have architectural implications for the communication scheme (for example, whether bursty 
transmissions are required). To get a sense of the operating regime a given network is in, our 
results suggest a rule-of -thumb quantity that can be calculated: the total received SNR per node, 
total over all the transmit powers of the nodes in the network. If this quantity is much larger 
than dB, then the network is in the degree-of-freedom limited regime; otherwise it is in the 
power-limited regime. 
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B. Far-field Assumption in Dense Scaling 



One potential concern with the dense scaling is that the far- field assumption will eventually 
break down as the number of nodes gets too large. In practice, the typical separation between 
nodes is so much larger than the carrier wavelength that the number of nodes for which the 
far-field assumption fails is humongous, i.e. there is a clear separation between the large and 
the small spatial scales. Consider the following numerical example: suppose the area of interest 
is 1 square km, well within the communication range of many radio devices. With a carrier 
frequency of 3 GHz, the carrier wavelength is 0.1m. Even with a very large system size of 
n = 10000 nodes, the typical separation between nearest neighbors is 10 m, very much in the 
far-field. Under free-space propagation and assuming unit transmit and receive antenna gains, 
the attenuation given by Friis' formula ^ is about 10^^, much smaller than unity. At the same 
time, the total received SNR per node (assuming transmit power P of 1 mW per node, thermal 
noise A^o at -174 dBm, a bandwidth W of 10 MHz and noise figure NF= 10 dB) is 84 dB, very 
much in the degree-of-freedom limited regimeQ. (Looking at even only one point-to-point link 
at distance 1 km, the received SNR is 34 dB). Hence, this example gives evidence that there are 
networks for which simultaneously the number of nodes is large, the far-field assumption holds 
and the received SNR across the network is high. However, a careful performance analysis of the 
pre-constants is required to confirm that linear scaling of our scheme has already kicked in and 
our scheme indeed outperforms multi-hop in this parameter range. Nevertheless, we do believe 
that the linear scaling obtained here also applies for a relatively small number of nodes. The 
intuition for this is that our strategy relies on the use of MIMO communication, whose linear 
capacity scaling has never been disputed in the range of a small number of antennas. 



C. d- dimensional networks 

We have focused on the 2D setting, where the nodes are on the plane, but our results generalize 
naturally to networks where nodes live in (/-dimensional space. For dense networks, linear scaling 
is achievable whenever a > d, i.e. whenever spatial reuse is possible. For extended networks. 



■•SNRdB = PdBm + 10 logio n + pathloSS^B - (iVo)dBm - lOlogio W - NFdB. 
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the scaling exponent e*^{a) is given by: 




2 - 




f d<a<d+l 



a> d+1 



For a between d and d+1, hierarchical MIMO achieves the optimal scaling, and for a > d+1, 
nearest neighbor multihop is optimal. 

D. Transport Capacity 

Let us finally mention that a more general measure of network performance has been in- 
troduced in [3]: the transport capacity of a network, defined as the maximum number of bits 
exchanged in the network per second, weighted by their travelled distances. From an upper 
bound on transport capacity, one can easily deduce an upper bound on the aggregate throughput 
for the special case where the source-destination pairs are chosen at random and communicating 
at a common rate, which is the traffic requirement considered in the current paper. But the 
interest in an upper bound on transport capacity lies in the fact that it applies to more general 
communication scenarios. Reciprocally, it has been shown recently in [14] that for a network with 
a random placement of nodes, there is a natural way to deduce an upper bound on transport 
capacity from an upper bound on throughput, by studying cutset bounds over multiple cuts 
(as first suggested in [5]). Applying this technique to the present result leads to the following 
conclusion: the transport capacity Tc{n) of the extended network is upper bounded by 

. Te(n) < K' n^-^-'^/^+' , for 2 < a < 3, 
. Tc{n) < K'n^+\ for a > 3, 

for any £ > 0, where > is a constant independent of n. Note that these scaling laws for 
the transport capacity are also achievable within a factor of rf. 



In point-to-point communication, performance is limited by either the power or the degrees 
of freedom (bandwidth and number of antennas) available, depending on whether the link is 
operating at low or high signal-to-noise ratio. In a network with multiple source-destination 
pairs, performance can further be limited by the interference between simultaneous transmission 
of information. In this paper, we have shown that by achieving near global MIMO cooperation 



VII. Conclusions 
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between nodes without introducing significant cooperation overhead, interference can be suc- 
cessfully removed as a limitation, at least as far as scaling laws are concerned. Moreover, such 
near-global MIMO cooperation also allows the maximum transfer of energy between all source- 
destination pairs, provided that the path loss across the network is not too much. This implies 
that in degrees-of-freedom limited scenarios, such as in dense networks or extended networks 
with path loss exponent a = 2, the full degrees of freedom in the network can be shared among 
all nodes and a linear capacity scaling can be achieved. In power-limited scenarios but with 
low attenuation, such as extended networks with a between 2 and 3, our scheme achieves the 
optimal (power-limited) capacity scaling law. 
The key ideas behind our scheme are: 

• using MIMO for long-range communication to achieve spatial multiplexing; 

• local transmit and receive cooperation to maximize spatial reuse; 

• setting up the intra-cluster cooperation such that it is yet another digital communication 
problem, but in a smaller network, thus enabling a hierarchical cooperation architecture. 
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Appendix I 

Linear scaling law for the MIMO gain under fast fading assumption 

Proof of Lemma \43\ The M x M MIMO channel between two clusters S and D is given by 
Y = HX + Z, where Hik are given in Recall that Z = (Zk) is uncorrected background 
noise plus interference at the receiver nodes. Assume that the transmitted signals X = (Xj) are 
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from an i.i.d. ~ A/c(0, cr^) randomly chosen codebook with 

2 Pjrspr 



It is well known that the achievable mutual information is lower bounded by assuming that the 
interference-plus-noise Z is i.i.d. Gaussian, (see for example Theorem 5 of [15] for a precise 
statement and proof of this in the MIMO case.) With our transmission strategy in the MIMO 

phase, there exists 6 > a > with a and b independent of n, such that r-j^"^^ = r^^^^ pik, where 
all lie in the interval [a, b] both in the cases when S and D are neighboring clusters or not. 

By assuming perfect channel state information at the receiver side, the mutual information of 
the above MEMO channel is given by 



I(X; Y,H) >E I logdet 1 1 +^HH*]]^E I log det 1 1 + —- FF*]] , (13) 



where SNR = ^ {N= total interference-plus-noise power) and Fj^ = pn. exp{j9ik). Let A be 
chosen uniformly among the M eigenvalues of j^FF*. The above mutual information may be 
rewritten as 




.2 





I{X;Y,H) > ME(log(l + SNRA)) > M log(l + SNRi) P(A > i), 



for any i > 0. By the Paley-Zygmund inequality, if <t < E(A), we have 



P(A >t)> 



E(A2) 



We therefore need to compute both E(A) and E(A^). We have. 





i,k=\ 



1 



M 



i,fe=l 
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and 



E(A^) = — E ( Tr ( -^FF*FF' 

M 



^{FikFikFimFim 



M3 



iklm=l 
M 



/ mi I 



M3 

ikm=l 

2 ^'^ 

SO E(A) > and E(A^) < 26^. This leads us to the conclusion that for any t < a, we have 

I{X;Y,H)>M\og{l + SNRt)^^^-^^, (14) 

Choosing e.g. t = a/2 shows that /(X; Y, H) grows at least linearly with M. □ 

Lemma I.l (Paley-Zygmund Inequality) Let X be a non-negative random variable such that 
E(X2) < oo. Then for any t > such that t < E{X), we have 

> - E(X2) • 

Proof: By the Cauchy-Schwarz inequality, we have for any t > 0: 



E(Xlx>t) < VE(X2)P(X>t). 

and also, if t < E(X), 

E(X lx>t) = E(X) - E(X lx<t) > E(X) - t > 0. 

Therefore, 

□ 

Note that the achievability results in this paper can be extended to the slow fading case, 
provided that Lemma 14.31 can be proved in the slow fading setting. In that case, one would 
need to show that the expression inside the expectation in (fT3l) concentrates around its mean 
exponentially fast in M. However, another difficulty might arise from the lack of averaging of 
the phases in the interference term, which leads to a non-spatially decorrelated noise term Z. 
Although proving the result might require some technical effort, we believe it holds true, due to 
the self- averaging effect of a large number of independent random variables. 
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Appendix II 
Achievable Rates on Quantized Channels 



In order to conclude the discussion on the throughput achieved by our scheme, we need to 
show that the quantized MIMO channel achieves the same spatial multiplexing gain as the MIMO 
channel. In Theorem II. 1, we give a simple achievability region for general quantized channels. 
Note that a stronger result is established in [16, Theorem 3] that implies Theorem II. 1 as a special 

case. The required result for the quantized MIMO channel is then found as an easy application 
of Theorem ILL We start by formally defining the general quantized channel problem in a form 
that is of interest to us and proceed with several definitions that will be needed in the sequel. 

Let us consider a discrete-time memoryless channel with single input of alphabet X and M 
outputs of respective alphabets 3^i, . . . , The channel is statistically described by a conditional 
probability distribution p{yi, . . . ,yM\x) for each yi e 3^i,...,?/m ^ 3^m and x e X. The 
outputs of the channel are to be followed by quantizers which independently map the output 
alphabets 3^i, . . . , yM to the respective reproduction alphabets j^i, . . . , S^m- The aim is to recover 
the transmitted information through the channel by observing the outputs of the quantizers. 
Communication over the channel takes place in the following manner: A message W, drawn 
from the index set {1,2,..., L} is encoded into a codeword X"''{W) G which is received 
as M random sequences (F™, . . . , Y^) ~ p{yTi • • • ; VmI^™") the outputs of the channel. The 
quantizers themselves consist of encoders and decoders, where the j'th encoder describes its cor- 
responding received sequence YJ^ by an index Ij{YJ^) e {1,2,..., Lj}, and decoder j represents 
YJ" by an estimate YJ"{Ij) e yp. The channel decoder then observes the reconstructed sequences 
Y{", . . . , Yj^ and guesses the index by an appropriate decoding rule W — g{Y{^, . . . , Y^). 
An error occurs if W is not the same as the index W that was transmitted. The complete 
model under investigation is shown in Fig. 7. An (L; Li, . . . , Lm] rn) code for this channel is 
a joint {L,m) channel and M quantization codes (Li, m), . . . , (L^^, m); more specifically, is 
two sets of encoding and decoding functions, the first set being the channel encoding function 
X"* : {1, 2, . . . , L} ^ A'"* and the channel decoding function g : y^x- ■ -xy^^ ^ {1,2, . . . , L}, 
and the second set consists of the encoding functions Ij : 3^J" — > {1,2, . . . , Lj} and decoding 
functions YJ" : {1, 2, . . . , L^} — > Jpj" for j = 1, . . . , M used for the quantizations. We define the 
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Fig. 8. The Quantized Channel Problem. 



(average) probability of error for the (L; Li, . . . , Lm] m) code by 



1 ^ 



k=l 

A set of rates (P; Pi, ... , Rm) is said to be achievable if there exists a sequence of 
(2"*^; 2""^!, . . . , 2"*^*^; m) codes with PJ" ^ as m ^ oo. Note that determining achievable 
rates (P; Pi, . . . , Rm) is not a trivial problem, since there is trade-off between maximizing P 
and minimizing Pi, ... , Rm- 

Theorem II. 1 (Achievability for the Quantized Channel Problem) Given a probability distribu- 
tion q{x) on X and M conditional probability distributions qj{yj\yj) where yj G yj and yj G y, 

and j = 1, . . . , M; all rates (P; Pi, ... , Pm) such that P < /(X; Yi, . . . , Fa/) and P^ > I(Yj; Yj) 
are achievable. Specifically, given any 5 > 0, q{x) and qj{yj\yj), together with rates P < 
I{X; Yi,..., Ym) and Rj > I{Yj; Yj) for j = 1, . . . , M; there exists a (2"^^; 2'"-^i, . . . , 2™^^; m) 
code such that PJ" < 5. 

Proof: The proof of the theorem for discrete finite-size alphabets relies on a random coding 
argument based on the idea of joint (strong) typicality. For the idea of strong typicality and 
properties of typical sequences, see [17]. The proof can be outlined as follows. Given q{x) 
generate a random channel codebook Cc with 2"*^ codewords, each of length m, independently 
from the distribution 



q{x"') = llq{x-'{k)). 



k=l 
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and call them X™(1),X™(2), . . . ,X™(2™^). Also generate M quantization codebooks Cj,j = 
1, . . . , M, each codebook Cj consisting of 2"^^^ codewords drawn independently from 

m 

pj(yT) = n Yl ii^)piy^^ • • • ' yM\x)qjiyj'ik)\yj). 

k=l x£X 

yi<^yi,---,yM<^yM 

and index them as Yp{l), Yp{2), . . . Yp(2"^^^). Given the message w send the codeword X'^{w) 
through the channel. The channel will yield Y{^, . . . , Y^. Given the channel output Y^ at the 
j'th quantizer, choose ij such that {Yj^,YJ^{ij)) are jointly typical. If there exist no such ij, 
declare an error. If the number of codewords in the quantization codebook 2"*^^ is greater 
than 2"*^*^^^'^^^ the probability of finding no such ij decreases to zero exponentially as m 
increases. The probability of failing to find such an index in at least one of the M quantizers 
is bounded above by the union bound with the sum of M exponentially decreasing probabili- 
ties in m. Given Y-[^{ii), . . . ,Y^{iM) at the channel decoder, choose the unique w such that 
{X^{w),Y^{ii), YSiHi)) are jointly typical. The fact that F-(^i), . . . , F^(^m)) 

will be jointly typical with high probability can be established by identifying the Markov 
chains in the problem and applying Markov Lemma [17, Lemma 14.8.1] repeatedly. Observing 
that (F™, . . . , y^, Y"™, . . . , Yp) — YJ^^ — Y^]^^ form a Markov chain and recursively applying 
Markov Lemma, we conclude that {Y^, . . . , Y^, Y^iii), . . . , Y]^{iM)) are jointly typical with 
probability approaching 1 as m increases. Observing that X™ — (Y"™, . . . , Y^) — (Y"™) • • • 5 ^m) 
form another Markov chain, again by Markov Lemma we have {X"\w), Y^{ii), . . . , Y^(zm)) 
jointly typical with high probability. If there are more than one codewords that are jointly 
typical with (Y{^{ii), . . . , Y^7(ijv/)), we declare an error. The probability of having more than 
one such sequence will decrease exponentially to zero as m increases, if the number of channel 
codewords 2"^ is less than 2™^(^;'^i' -''*^«). Hence if i? < /(X; Yi, . . . , Ym) and Rj > I{Yf, Yj), 
the probability of error averaged over all codes decreases to zero as m ^ oo. This shows the 
existence of a code that achieves rates (R; Ri, . . . , Rm) with arbitrarily small probability of error. 
The result can be readily extended to memoryless channels with discrete-time and continuous 
alphabets by standard arguments (see [18, Ch.7]). □ 

Proof of Lemma \4A\ Now we turn to our original problem: We need to show that it is possible 
to encode the observations at the outputs of the MIMO channel at a fixed rate, while preserving 
the spatial multiplexing gain of the MIMO channel. This is a direct consequence of Theorem II. 1: 
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Consider the conditional probability densities 

for the quantization process. From Theorem II. 1 we know that for any distribution p(x) on the 
input space, all rate pairs {R; . . . , Rm) are simultaneously achievable if 

Rj>I{Y,;Y,) j = l,...,M and R < I{X;Yi, . . . ,Ym) 

where now Rj is the encoding rate of the j'th stream and R is the total transmission rate over 
the MIMO channel. Using Lemma |431 we have that I{Yj; Yj) < log(l + for any probability 
distribution p(x) on the input space. So if we choose 

R,=log{l + ^)+6 Vj = l,...,M 

for some £ > 0, all rates 

R<I{X;Yi,...,Ym) 

are achievable on the quantized MEMO channel for any input distribution p{x). Note that now 
the channel from X to Yi, . . . , Ym is given by 

Y = HX + Z + D 

where D ~ A/c(0, A^J). Obviously, this channel has the same spatial multiplexing gain with the 
original MIMO channel. □ 

Proof of Lemma Consider the case where the MIMO signals are corrupted by interference 
of increasing power Kj log M. In this case, the power received by the destination nodes is not 
bounded anymore and increases as P2 + Kj log M with increasing M. In order to apply the 
technique employed in the proof of Lemma 14. 4[ one can first normalize the received signal 
by multiplying it by g = ^J'^^^^^ and then do the quantization as before. Note that the 
resultant scaled quantized MIMO channel is given by 

Y = q{HX + Z) + D 

where again D ~ A/'c(0, A^J) and Z = (zk) is the background noise plus interference vector 
independent of the signal with uncorrected entries of power lS.[zl] < Nq + KjlogM. Thus we 
can apply the result of Lemma 14.31 Note that the resultant signal-to-noise-ratio SNR > j^J^ for 
a constant > 0. Plugging this SNR expression into (fT4)) yields M/ logM capacity scaling for 
the resultant channel. □ 
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Appendix III 

Largest eigenvalue behaviour of the equalized channel matrix H 

In this appendix, we give the proofs of Lemma [53] and Lemma [54l We start with Lemma [53l 
The proof of the second lemma is given at the end of the section. 

Proof of Lemma \53\ Let us start by considering the 2m*'^ moment of the spectral norm of H 
given by (see [19, Ch. 5]) 

By dominated convergence theorem and Jensen's inequality, we have 

E(||^||2-) < lim{E(Tr((^*^)'))r/'. 

In the subsequent paragraphs, we will prove that the following upper bound holds with high 
probability, 

E{Tr{{H*Hy)) <tin{K[\ognf (15) 

where ti = are the Catalan numbers and > is a constant independent of n. By 

Chebyshev's inequality, this allows to conclude that for any m, 

since lim^^oo = 4. For any e > 0, choosing m sufficiently large shows therefore that F{Bn,e) 
decays polynomially with arbitrary exponent as n ^ oo, which is the result stated in Lemma [531 
There remains to prove the upperbound in ([T51) . Expanding the expression gives 

E(Tr{{H*Hy)) = ^ E {jii^k^Hi^k2Hi.^k2Hi2k3Hi^^^ (16) 

ii,...,ii&D\VD 
ki,...,ki<^S 

Recall that the random variables Hi^ are independent and zero-mean, so the expectation is only 
non-zero when the terms in the product form conjugate pairs. Let us consider the case / = 2 as 
an example. We have, 

E{Tr{{H*Hy))= J2 ^(H~i^,Kk,'H~^A.k) (17) 
= \Hi^k\^\Hi2k\^ + \Hik^\^\Hik2\^ (18) 

fceS ki^k2&S 
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Fig. 9. The product in Eq. [17] illustrated as a ring. 



since the expectation is non-zero only when either ki = k2 = k or ii = i2 = i. Note that we 
have removed the expectations in (fTSi) since \Hik\^ is a deterministic quantity in our case. The 
expression can be bounded above by 

E(Tr((^*#)2)) < ^ \H,,k?\H,,k? + Yl \H^kf\H,k,\^ (19) 

ii,i2'^D\VD i&D\VD 
k(^S kiM&S 

where we now doublecount the terms with ii = i2 = i and ki = k2 = k, that is, the terms of 
the form \Hik\'^. 

The non-vanishing terms in the sum in (fTT]) can also be determined by the following approach, 
which generalizes to larger /: let each index be associated to a vertex and each term in the product 
in (flTI) to an edge between its corresponding vertices. Note that the resulting graph is in general 
a ring with 4 edges as depicted in Figure 9. A term in the summation in (fTTI) is only non-zero if 
each edge of its corresponding graph has even multiplicity. Such a graph can be obtained from 
the ring in Figure 9 by merging some of the vertices, thus equating their corresponding indices. 
For example, merging the vertices ki and ^2 into a single vertex k gives the graph in Figure 
10-a; on the other hand, merging ii and ^2 into a single vertex i gives Figure 10-b. Note that in 
the first figure ii, 12 can take values in D\Vd and k can take values in S, thus the sum of all 
such terms yields 

\Hi^k\^\Hi^k\'^ . (20) 

ii,i2&D\VD 
k&S 
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fc2 




11*1 = 



l2 = I 



ki = k2 = k 



a) 



b) 



Fig. 10. Two possible graphs corresponding to the non-zero terms in ( 117b . 



Similarly, the terms of the form in Figure 10-b sum up to 




(21) 



Note that another possible graph composed of edges with even multiplicity can be obtained 
by further merging the vertices ii and 22 into a single vertex i in Figure 10-a, or equivalently 
merging ki and k2 into k in Figure 10-b. This will result in a graph with only two vertices 
k and i and a quadruple edge in between which corresponds to terms of the form \Hik\^ with 
i E D\Vd and k E S. Note however that such terms have already been considered in both (|20|) 
and (|2T]) since we did not exclude the case ii = ^2 in (l20l ) and ki = k2 in (|2T]) . In fact, terms 
corresponding to any graph with number of vertices less than 3 are already accounted for in 
either one of the sums in (l20l) and (|2T1) . or simultaneously in both. Hence, the sum of (|20|) and 
(I2TI) is an upper bound for (fTTl ) yielding again (fT9l ). 

In the general case with / > 2, considering (fT6l) leads to a larger ring with 21 edges, as 
depicted in Figure 11. Similarly to the case / = 2, the non- vanishing terms in (fT6l) are those that 
correspond to a graph having only edges of even multiplicity. Since each edge can have at least 
double multiplicity, such graphs can have at most / edges. In turn, a graph with / edges can 
have at most / + 1 vertices which is the case of a tree. Hence, let us first start by considering 
such trees; namely, planar trees with / branches that are rooted (at ki) and planted, implying 
that rotating asymmetric trees around the root results in a new tree. See Figure 12 which depicts 
the five possible trees with / = 3 branches where we relabel the resultant / + 1 = 4 vertices as 



48 




Fig. 11. The product in Eq. [T6] illustrated as a ring. 

Pi, . . . ,P4. In general, the number of different planar, rooted, planted trees with / branches is 
given by the Z'th Catalan number ti [20]. In each of these trees, the / + 1 vertices pi, . . . ,Pi+i 
take values in either D \Vd or S. Hence, each tree corresponds to a group of non-zero 
terms, 

5Z f.TM^---,Pi+i), t = l,...,U. (22) 

pi,-,pi+i 

Note that if a non-vanishing term in (fT6l) corresponds to a graph with less than / + 1 vertices, 
then the corresponding graph posseses either edges with multiplicity larger than 2 or cycles, and 
this term is already accounted for in either one or more of the terms in (l22l) . This fact can be 




Fig. 12. Planar rooted planted trees with 3 branches. Note that each edge is actually a double edge in our case, although 
depicted with a single line in the figure. 
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Fig. 13. The product in Eq. [16] illustrated as a ring. 



observed by noticing that both edges with large multiplicity as well as cycles can be untied to 
get trees with / branches, with some of the / + 1 indices constrained however to share the same 
values (see Figure 13). Note that such cases are not excluded in the summations in (|22l) . thus 
we have 

ti 

i=l 

Below we show that 

Tl <n{K[\ogn)\ (23) 

in a regular network and 

Tl <n{K[\ognf, \/t (24) 

with high probability in a random network. We first concentrate on regular networks in order 
to reveal the proof idea in the simplest setting. A binning argument then allows to extend the 
result to random networks. 

a) Regular network: Recall that in the regular case, the nodes on the left-half are located at 
positions (— /c^ + ^,ky) and those on the right half at (i^., iy) for ka:,ky,ix,iy = 1, • • • , \/n. In 
this case, the matrix elements of H are given by 

gi Oik ][ 
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Fig. 14. A simple tree with I branches. 



and 

dkx,ky ^ ^ 



^((^, + A;,-l)2 + (z,-A;,)2)"/2- 

In the discussion below, we will need an upper bound on the scaling of J2^=i l-^jfcP 
Sfc=i l-^jfcP- By Lemma [54l we have 



X 



for a constant K!^ > independent of n which, in turn, yields the upper bound 



Summing over either i or k, and using the upper bound in Lemma 15.41 for a = 2 yields 

n n 

J2 \H,k\\Y.\^^''\' ^ K[\ogn (25) 

i=l k=l 

K' 

where K[ = with K'2 and K'^ being the constants appearing in the lemma. 

Let us first consider the simplest case where the tree is composed of / height 1 branches and 
denote it by (see Figure 14). We have 



Pl,---,Pl + l=l Plv,Pi + l=l 



Pi+lpl I 



n I n 

2 



Pl = l \P2 = 1 



P2PI I 



<n{K[ log nY (26) 
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which follows from the upper bound (I25I) . 

Now let us consider the general case of an arbitrary tree S/'^ having s leaves, where 1 < s < Z 
(see Figure 15). Let the indices corresponding to these leaves be mi, . . . ,ms. Let us denote 
the "parent" vertices of these leaves by pi, . . . ,Ps' and assume that pi is the common parent 
vertex of leaves mi, ... , m^^; p2 is the common parent vertex of leaves m(d^+i), . . . , nid^ etc. 
and finally Ps' is the parent of m(dt+i), . . . , m^. The term T- corresponding to this tree is given 
by 



= f,yi{pi,...,pi+i) 

mi,...,ms 

Piv,P{;+i-s) 

n 

/ , 1,^^-" • • • 5 P{l + 1-S)) 

i' 

P(;+l_s)=l 
n 

E\h 1^ I H" 1^ I H" 1^ Iff 1^ Iff 1^ Iff 

I'imipil •••l-nmd^pil |-nm(rf^+i)P2 1 •••l-nm^jPsI ••• |-"m(tii+l)Ps' I • • • l-^i 



Piv,P{;+i-s)=l 
n 

X 

mi,...,ms=l 



|2 



msPg/ I 



(27) 

<Tlr{K[\ognr (28) 



where T^'"** corresponds to a smaller (and shorter) tree J^}'"''^ with / — s branchejf]. The argument 
above decreases the height of the tree by 1 , hence can be applied recursively to get a simple tree 
composed only of height 1 branches in which case the upper bound in (|26l) applies. Thus, given 
ev^' let h be the number of recursions to get a simple tree and si, . . . , s/j denote the number of 
leaves in the trees observed at each step of the recursion. We have 

Tl < {K[\ogny^{K[\ogny^ . . . {K'^Xogny^Tl-''--"^ 
< n{K[\ogny 

since tI~'''"~"' < n{K[\ogny-'^-~"- by Thus, ^ follows. 

b) Random network: We denote the locations of the nodes to the left of the cut by = 
(— a^,a^) where af. is the x-coordinate and is the y-coordinate of node k E S and those to 

'Note that the term corresponding to a leaf rn can be either \Hmpf or |f/pmp depending on whether the height of the leaf 
is even or odd. However, in ( I27t , we ignore this issue in order to simplify the notation since the upper bound l l28t applies in 
both cases. 
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Fig. 15. A tree with leaves mi, m2, . . . , ms- 



the right of the cut are similarly denoted by hi = (6f , 6f ) for i G D\Vd- this case, the matrix 
elements of H are given by 



Hi 



ik 



and 

In parallel to the regular case, we will need an upper bound on J2ieD\VD '^kes l-^ifcP- 

The upper bound can be obtained in two steps by first showing that 

( x\2-a 

dk > (29) 

with high probability for a constant i^^^ > independent of n, which leads to 

1 (a^)"~^ 1 1 

for all i,k. This, in turn yields 

J2\H^k\^ Yl \H^k\^ < K[{\ognf (31) 

keS i&D\VD 

with high probability for another constant i^'^ > independent of n. Recalling the leaf removal 
argument discussed for regular networks immediately leads to (|24l) . 



Both the lower bound in (|29l) and the upper bound in (|3T| ) regarding random networks can be 
proved using 
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binning arguments that provide the connection to regular networks. In order to prove the lower 
bound, we consider Part (b) of Lemma I5.1[ while the upper bound (|3TI) is proved using Part (a) 
of the same lemma. 

Let us first consider dividing the right-half network into squarelets of area 2\ogn. Given a 
left-hand side node k located at (— a^,a^), let us move the nodes inside each right-hand side 
squarelet onto the squarelet vertex that is farthest to k. Since this displacement can only increase 
the Euclidean distance between the nodes involved, and since by Part (b) of Lemma 15. 1[ we 
know that there is at least one node inside each squarelet, we have 

y^nT^Togn 



~ 21ogn 
by using the lower bound in Lemma 15.41 

Now having (|30l ) in hand, in order to show (|3T| ), we divide the network into n squarelets of 
area 1. By Part (a) of Lemma 15. 1[ there are at most logn nodes inside each squarelet. Considering 
the argument in Section |V] and the displacement of the nodes as illustrated in Figure 7 yields a 
regular network with at most 2 log n nodes at each vertex in the right-half network, 

2 . 1 



i&D\VD i&D\VD 
4 



k) 



< _(iogn)^ ^ ; 

K'^ {ix + kxY + {iy — ky 



< 4:K[{\ognf. 

by employing the upper bound in Lemma [54l for a = 2. The same bound follows similarly for 
Xlfces \Hik\^, thus the desired result in (|3T1) . □ 
Proof of Lemma \5.4\ Both the lower and upper bound for dk^^ky can be obtained by straight- 
forward manipulations. Recall that 
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The upper bound can be obtained as follows: 



^kx,ky ^ ^ ^ ^ 



1 



So 



y — 1 ky X — kx y — 1 ky 



+ 



kx+^/n-l 



(X2 + ?/2)a/2 



dx 



fkx+^/n-l 1 

< fc-" + / —dx 



X^ 



'Jn-ky 



+ 



^-ky pkx+^/n—l 



1-ky 



2 _^ ^2-)a/2 



dx dy 



dy 



< A;;" + (1 + 7r)fc^-" + / / —rdrde 



-tt/2 Jkx 



A;-" + (l + 7r)A;i-" +7rlogr 
A;- + (i + ^)A;i- +^ 



3>/n 



'iy/n 



if a = 2, 



if a > 2, 



(32) 



< 



^ogn if a = 2, 
K'^ kl-"" if a > 2, 



for a constant K'2> Q independent of n, since the dominating terms in (l32l) are the third ones. 



The lower bound follows similarly: 

y/n-ky kx+y/ri-l 

^kx iky 



/n~ky kx+y/n-l 

y y I 

Z-^ /-^ (^2 I 



(3.2 + ^2)a/2 - J^^ ^^2 + ^2)a/2 

y — 1 ky X — kx y — 1 ky ^ 



dx 



> 



> 



> 



Jk 



(X2 + ?/2)a/2 



X 



l-a 



a — 1 



kx+Vn.-l 



arctan(l/2) rkx+\/n—l 



— rdrd9 + 



X 



l-a 



V2kx 



a — 1 



kx+'/n-l 
kx 



So for all a > 2, we have 



arctan(|) logr 
arctan(|)2^r2" 



kx+^-l 



V2kx 



kx+y/n-1 



kx+y/n-1 
V2kx 



1 ^1- 

Q— 1 



kx+y/n~l 



if a = 2, 



if a > 2, 



(33) 
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where fCg > is a constant independent of n, since the dominating terms in (l33l) are the first 
ones. This concludes the proof of the lemma. □ 
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